Pesquisa | Portal Regional da BVS

1.

Accurate sequencing of DNA motifs able to form alternative (non-B) structures.

Weissensteiner, Matthias H; Cremona, Marzia A; Guiblet, Wilfried M; Stoler, Nicholas; Harris, Robert S; Cechova, Monika; Eckert, Kristin A; Chiaromonte, Francesca; Huang, Yi-Fei; Makova, Kateryna D.

Genome Res ; 33(6): 907-922, 2023 06.

Artigo em Inglês | MEDLINE | ID: mdl-37433640

RESUMO

Approximately 13% of the human genome at certain motifs have the potential to form noncanonical (non-B) DNA structures (e.g., G-quadruplexes, cruciforms, and Z-DNA), which regulate many cellular processes but also affect the activity of polymerases and helicases. Because sequencing technologies use these enzymes, they might possess increased errors at non-B structures. To evaluate this, we analyzed error rates, read depth, and base quality of Illumina, Pacific Biosciences (PacBio) HiFi, and Oxford Nanopore Technologies (ONT) sequencing at non-B motifs. All technologies showed altered sequencing success for most non-B motif types, although this could be owing to several factors, including structure formation, biased GC content, and the presence of homopolymers. Single-nucleotide mismatch errors had low biases in HiFi and ONT for all non-B motif types but were increased for G-quadruplexes and Z-DNA in all three technologies. Deletion errors were increased for all non-B types but Z-DNA in Illumina and HiFi, as well as only for G-quadruplexes in ONT. Insertion errors for non-B motifs were highly, moderately, and slightly elevated in Illumina, HiFi, and ONT, respectively. Additionally, we developed a probabilistic approach to determine the number of false positives at non-B motifs depending on sample size and variant frequency, and applied it to publicly available data sets (1000 Genomes, Simons Genome Diversity Project, and gnomAD). We conclude that elevated sequencing errors at non-B DNA motifs should be considered in low-read-depth studies (single-cell, ancient DNA, and pooled-sample population sequencing) and in scoring rare variants. Combining technologies should maximize sequencing accuracy in future studies of non-B DNA.

Assuntos

DNA Forma Z , Nanoporos , Humanos , Motivos de Nucleotídeos , Análise de Sequência de DNA , DNA/genética , Composição de Bases , Sequenciamento de Nucleotídeos em Larga Escala

2.

Sequencing error profiles of Illumina sequencing instruments.

Stoler, Nicholas; Nekrutenko, Anton.

NAR Genom Bioinform ; 3(1): lqab019, 2021 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-33817639

RESUMO

Sequencing technology has achieved great advances in the past decade. Studies have previously shown the quality of specific instruments in controlled conditions. Here, we developed a method able to retroactively determine the error rate of most public sequencing datasets. To do this, we utilized the overlaps between reads that are a feature of many sequencing libraries. With this method, we surveyed 1943 different datasets from seven different sequencing instruments produced by Illumina. We show that among public datasets, the more expensive platforms like HiSeq and NovaSeq have a lower error rate and less variation. But we also discovered that there is great variation within each platform, with the accuracy of a sequencing experiment depending greatly on the experimenter. We show the importance of sequence context, especially the phenomenon where preceding bases bias the following bases toward the same identity. We also show the difference in patterns of sequence bias between instruments. Contrary to expectations based on the underlying chemistry, HiSeq X Ten and NovaSeq 6000 share notable exceptions to the preceding-base bias. Our results demonstrate the importance of the specific circumstances of every sequencing experiment, and the importance of evaluating the quality of each one.

3.

Erratum: Increased yields of duplex sequencing data by a series of quality control tools.

Povysil, Gundula; Heinzl, Monika; Salazar, Renato; Stoler, Nicholas; Nekrutenko, Anton; Tiemann-Boege, Irene.

NAR Genom Bioinform ; 3(1): lqab014, 2021 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-33709076

RESUMO

[This corrects the article DOI: 10.1093/nargab/lqab002.].

4.

Increased yields of duplex sequencing data by a series of quality control tools.

Povysil, Gundula; Heinzl, Monika; Salazar, Renato; Stoler, Nicholas; Nekrutenko, Anton; Tiemann-Boege, Irene.

NAR Genom Bioinform ; 3(1): lqab002, 2021 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-33575654

RESUMO

Duplex sequencing is currently the most reliable method to identify ultra-low frequency DNA variants by grouping sequence reads derived from the same DNA molecule into families with information on the forward and reverse strand. However, only a small proportion of reads are assembled into duplex consensus sequences (DCS), and reads with potentially valuable information are discarded at different steps of the bioinformatics pipeline, especially reads without a family. We developed a bioinformatics toolset that analyses the tag and family composition with the purpose to understand data loss and implement modifications to maximize the data output for the variant calling. Specifically, our tools show that tags contain polymerase chain reaction and sequencing errors that contribute to data loss and lower DCS yields. Our tools also identified chimeras, which likely reflect barcode collisions. Finally, we also developed a tool that re-examines variant calls from raw reads and provides different summary data that categorizes the confidence level of a variant call by a tier-based system. With this tool, we can include reads without a family and check the reliability of the call, that increases substantially the sequencing depth for variant calling, a particular important advantage for low-input samples or low-coverage regions.

5.

Age-related accumulation of de novo mitochondrial mutations in mammalian oocytes and somatic tissues.

Arbeithuber, Barbara; Hester, James; Cremona, Marzia A; Stoler, Nicholas; Zaidi, Arslan; Higgins, Bonnie; Anthony, Kate; Chiaromonte, Francesca; Diaz, Francisco J; Makova, Kateryna D.

PLoS Biol ; 18(7): e3000745, 2020 07.

Artigo em Inglês | MEDLINE | ID: mdl-32667908

RESUMO

Mutations create genetic variation for other evolutionary forces to operate on and cause numerous genetic diseases. Nevertheless, how de novo mutations arise remains poorly understood. Progress in the area is hindered by the fact that error rates of conventional sequencing technologies (1 in 100 or 1,000 base pairs) are several orders of magnitude higher than de novo mutation rates (1 in 10,000,000 or 100,000,000 base pairs per generation). Moreover, previous analyses of germline de novo mutations examined pedigrees (and not germ cells) and thus were likely affected by selection. Here, we applied highly accurate duplex sequencing to detect low-frequency, de novo mutations in mitochondrial DNA (mtDNA) directly from oocytes and from somatic tissues (brain and muscle) of 36 mice from two independent pedigrees. We found mtDNA mutation frequencies 2- to 3-fold higher in 10-month-old than in 1-month-old mice, demonstrating mutation accumulation during the period of only 9 mo. Mutation frequencies and patterns differed between germline and somatic tissues and among mtDNA regions, suggestive of distinct mutagenesis mechanisms. Additionally, we discovered a more pronounced genetic drift of mitochondrial genetic variants in the germline of older versus younger mice, arguing for mtDNA turnover during oocyte meiotic arrest. Our study deciphered for the first time the intricacies of germline de novo mutagenesis using duplex sequencing directly in oocytes, which provided unprecedented resolution and minimized selection effects present in pedigree studies. Moreover, our work provides important information about the origins and accumulation of mutations with aging/maturation and has implications for delayed reproduction in modern human societies. Furthermore, the duplex sequencing method we optimized for single cells opens avenues for investigating low-frequency mutations in other studies.

Assuntos

Envelhecimento/genética , Mamíferos/genética , Mitocôndrias/genética , Mutação/genética , Oócitos/metabolismo , Especificidade de Órgãos/genética , Animais , Análise Mutacional de DNA , DNA Mitocondrial/genética , Feminino , Frequência do Gene/genética , Deriva Genética , Células Germinativas/metabolismo , Padrões de Herança/genética , Modelos Logísticos , Masculino , Camundongos , Modelos Genéticos , Taxa de Mutação , Nucleotídeos/genética , Linhagem

6.

Family reunion via error correction: an efficient analysis of duplex sequencing data.

Stoler, Nicholas; Arbeithuber, Barbara; Povysil, Gundula; Heinzl, Monika; Salazar, Renato; Makova, Kateryna D; Tiemann-Boege, Irene; Nekrutenko, Anton.

BMC Bioinformatics ; 21(1): 96, 2020 Mar 04.

Artigo em Inglês | MEDLINE | ID: mdl-32131723

RESUMO

BACKGROUND: Duplex sequencing is the most accurate approach for identification of sequence variants present at very low frequencies. Its power comes from pooling together multiple descendants of both strands of original DNA molecules, which allows distinguishing true nucleotide substitutions from PCR amplification and sequencing artifacts. This strategy comes at a cost-sequencing the same molecule multiple times increases dynamic range but significantly diminishes coverage, making whole genome duplex sequencing prohibitively expensive. Furthermore, every duplex experiment produces a substantial proportion of singleton reads that cannot be used in the analysis and are thrown away. RESULTS: In this paper we demonstrate that a significant fraction of these reads contains PCR or sequencing errors within duplex tags. Correction of such errors allows "reuniting" these reads with their respective families increasing the output of the method and making it more cost effective. CONCLUSIONS: We combine an error correction strategy with a number of algorithmic improvements in a new version of the duplex analysis software, Du Novo 2.0. It is written in Python, C, AWK, and Bash. It is open source and readily available through Galaxy, Bioconda, and Github: https://github.com/galaxyproject/dunovo.

Assuntos

Interface Usuário-Computador , Algoritmos , DNA/química , DNA/metabolismo , Humanos , Alinhamento de Sequência , Análise de Sequência de DNA

7.

A 1,000-Year-Old RNA Virus.

Peyambari, Mahtab; Warner, Sylvia; Stoler, Nicholas; Rainer, Drew; Roossinck, Marilyn J.

J Virol ; 93(1)2019 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-30305356

RESUMO

Only a few RNA viruses have been discovered from archaeological samples, the oldest dating from about 750 years ago. Using ancient maize cobs from Antelope house, Arizona, dating from ca. 1,000 CE, we discovered a novel plant virus with a double-stranded RNA genome. The virus is a member of the family Chrysoviridae that infect plants and fungi in a persistent manner. The extracted double-stranded RNA from 312 maize cobs was converted to cDNA, and sequences were determined using an Illumina HiSeq 2000. Assembled contigs from many samples showed similarity to Anthurium mosaic-associated virus and Persea americana chrysovirus, putative species in the Chrysovirus genus, and nearly complete genomes were found in three ancient maize samples. We named this new virus Zea mays chrysovirus 1. Using specific primers, we were able to recover sequences of a closely related virus from modern maize and obtained the nearly complete sequences of the three genomic RNAs. Comparing the nucleotide sequences of the three genomic RNAs of the modern and ancient viruses showed 98, 96.7, and 97.4% identities, respectively. Hence, in 1,000 years of maize cultivation, this virus has undergone about 3% divergence.IMPORTANCE A virus related to plant chrysoviruses was found in numerous ancient samples of maize, with nearly complete genomes in three samples. The age of the ancient samples (i.e., about 1,000 years old) was confirmed by carbon dating. Chrysoviruses are persistent plant viruses. They infect their hosts from generation to generation by transmission through seeds and can remain in their hosts for very long time periods. When modern corn samples were analyzed, a closely related chrysovirus was found with only about 3% divergence from the ancient sequences. This virus represents the oldest known plant virus.

Assuntos

Sedimentos Geológicos/virologia , Vírus de Plantas/classificação , RNA de Cadeia Dupla/genética , Zea mays/virologia , Arizona , Evolução Molecular , Tamanho do Genoma , Sequenciamento de Nucleotídeos em Larga Escala , Filogenia , Vírus de Plantas/isolamento & purificação , Vírus de RNA/genética , Análise de Sequência de DNA , Análise de Sequência de RNA

8.

Streamlined analysis of duplex sequencing data with Du Novo.

Stoler, Nicholas; Arbeithuber, Barbara; Guiblet, Wilfried; Makova, Kateryna D; Nekrutenko, Anton.

Genome Biol ; 17(1): 180, 2016 08 26.

Artigo em Inglês | MEDLINE | ID: mdl-27566673

RESUMO

Duplex sequencing was originally developed to detect rare nucleotide polymorphisms normally obscured by the noise of high-throughput sequencing. Here we describe a new, streamlined, reference-free approach for the analysis of duplex sequencing data. We show the approach performs well on simulated data and precisely reproduces previously published results and apply it to a newly produced dataset, enabling us to type low-frequency variants in human mitochondrial DNA. Finally, we provide all necessary tools as stand-alone components as well as integrate them into the Galaxy platform. All analyses performed in this manuscript can be repeated exactly as described at http://usegalaxy.org/duplex .

Assuntos

DNA Mitocondrial/genética , Sequenciamento de Nucleotídeos em Larga Escala , Polimorfismo de Nucleotídeo Único/genética , Software , Genômica , Humanos , Análise de Sequência de DNA/métodos

9.

Maternal age effect and severe germ-line bottleneck in the inheritance of human mitochondrial DNA.

Rebolledo-Jaramillo, Boris; Su, Marcia Shu-Wei; Stoler, Nicholas; McElhoe, Jennifer A; Dickins, Benjamin; Blankenberg, Daniel; Korneliussen, Thorfinn S; Chiaromonte, Francesca; Nielsen, Rasmus; Holland, Mitchell M; Paul, Ian M; Nekrutenko, Anton; Makova, Kateryna D.

Proc Natl Acad Sci U S A ; 111(43): 15474-9, 2014 Oct 28.

Artigo em Inglês | MEDLINE | ID: mdl-25313049

RESUMO

The manifestation of mitochondrial DNA (mtDNA) diseases depends on the frequency of heteroplasmy (the presence of several alleles in an individual), yet its transmission across generations cannot be readily predicted owing to a lack of data on the size of the mtDNA bottleneck during oogenesis. For deleterious heteroplasmies, a severe bottleneck may abruptly transform a benign (low) frequency in a mother into a disease-causing (high) frequency in her child. Here we present a high-resolution study of heteroplasmy transmission conducted on blood and buccal mtDNA of 39 healthy mother-child pairs of European ancestry (a total of 156 samples, each sequenced at â¼20,000× per site). On average, each individual carried one heteroplasmy, and one in eight individuals carried a disease-associated heteroplasmy, with minor allele frequency ≥1%. We observed frequent drastic heteroplasmy frequency shifts between generations and estimated the effective size of the germ-line mtDNA bottleneck at only â¼30-35 (interquartile range from 9 to 141). Accounting for heteroplasmies, we estimated the mtDNA germ-line mutation rate at 1.3 × 10(-8) (interquartile range from 4.2 × 10(-9) to 4.1 × 10(-8)) mutations per site per year, an order of magnitude higher than for nuclear DNA. Notably, we found a positive association between the number of heteroplasmies in a child and maternal age at fertilization, likely attributable to oocyte aging. This study also took advantage of droplet digital PCR (ddPCR) to validate heteroplasmies and confirm a de novo mutation. Our results can be used to predict the transmission of disease-causing mtDNA variants and illuminate evolutionary dynamics of the mitochondrial genome.

Assuntos

DNA Mitocondrial/genética , Células Germinativas/metabolismo , Padrões de Herança/genética , Idade Materna , Fatores Etários , Criança , Doença/genética , Feminino , Frequência do Gene/genética , Humanos , Mutação INDEL/genética , Reprodutibilidade dos Testes , Análise de Sequência de DNA

10.

Dissemination of scientific software with Galaxy ToolShed.

Blankenberg, Daniel; Von Kuster, Gregory; Bouvier, Emil; Baker, Dannon; Afgan, Enis; Stoler, Nicholas; Taylor, James; Nekrutenko, Anton.

Genome Biol ; 15(2): 403, 2014 Feb 20.

Artigo em Inglês | MEDLINE | ID: mdl-25001293

RESUMO

The proliferation of web-based integrative analysis frameworks has enabled users to perform complex analyses directly through the web. Unfortunately, it also revoked the freedom to easily select the most appropriate tools. To address this, we have developed Galaxy ToolShed.

Assuntos

Biologia Computacional , Internet , Software , Ciência

11.

Controlling for contamination in re-sequencing studies with a reproducible web-based phylogenetic approach.

Dickins, Benjamin; Rebolledo-Jaramillo, Boris; Su, Marcia Shu-Wei; Paul, Ian M; Blankenberg, Daniel; Stoler, Nicholas; Makova, Kateryna D; Nekrutenko, Anton.

Biotechniques ; 56(3): 134-141, 2014.

Artigo em Inglês | MEDLINE | ID: mdl-24641477

RESUMO

Polymorphism discovery is a routine application of next-generation sequencing technology where multiple samples are sent to a service provider for library preparation, subsequent sequencing, and bioinformatic analyses. The decreasing cost and advances in multiplexing approaches have made it possible to analyze hundreds of samples at a reasonable cost. However, because of the manual steps involved in the initial processing of samples and handling of sequencing equipment, cross-contamination remains a significant challenge. It is especially problematic in cases where polymorphism frequencies do not adhere to diploid expectation, for example, heterogeneous tumor samples, organellar genomes, as well as during bacterial and viral sequencing. In these instances, low levels of contamination may be readily mistaken for polymorphisms, leading to false results. Here we describe practical steps designed to reliably detect contamination and uncover its origin, and also provide new, Galaxy-based, readily accessible computational tools and workflows for quality control. All results described in this report can be reproduced interactively on the web as described at http://usegalaxy.org/contamination.

Assuntos

Contaminação por DNA , Análise de Sequência de DNA/métodos , Análise de Sequência/métodos , DNA Mitocondrial/química , DNA Mitocondrial/genética , Internet , Polimorfismo Genético , Reprodutibilidade dos Testes

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA